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WO 99/27682 PCT/US98/24355 
METHOD AND APPARATUS FOR SURVEILLANCE 
IN COMMUNICATIONS NETWORKS 



Background of the Invention 

Field of the Invention 

The present invention is directed to a method and apparatus for providing surveillance 
capabilities in a communications network, where the surveillance decisions are made 
automatically by an analysis of data traversing the network. 

Description of Related Art 

There is a large amount of traffic flowing through today's computer networks, and not all 
of this traffic is benign. Thus, the owner or supervisor of the network may need to "listen in on" 
network communications in order to effectively monitor and secure the network. Such 
monitoring or surveillance can be achieved by connecting a probe to the network in order to 
monitor data traveling between two or more nodes (e.g., user workstations) on the network. 

Currently, the task of surveillance is "knowledge-intensive," in that human operators 
generally decide when it is advisable to survey, whom to survey, how long to survey, what kind 
of information to look for, and how to survey (i.e., where to place the network probes). Thus the 
surveillance task, as currently known, requires considerable intervention on the part of a human 
operator. 

In a system where communications between two nodes is in a form of discrete packets, 
the network probe can "read" a packet of data in order to discover information such as the source 
and destination addressees of the packet, or the protocol of the packet. In addition, over time, 
measurements can be computed such as the average or total amount of traffic of a certain 
protocol type during a specific week, or a total number of packets sent to or from a node. This 
information may then be reported to a system administrator in real-time, or may be stored for 
later analysis. 

Clearview Network Window, a software program available from Clear Communications 
Corporation, of Lincolnshire, Illinois, USA, allegedly provides predictive/proactive maintenance, 
intelligent root-cause analysis, and proof-of-quality reports. However, the output is designed for 
network fault management, which is not the same as "tapping" into a communication between 
nodes in the network. Thus, the Clearview system does not allow monitoring of data transferred 
between two nodes in the network with regard to content or characteristics. 
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Livermore National Laboratory, Livermore, California, USA, developed a group of 
computer programs to protect the U.S. Department of Energy's computers by "sniffing" data 
packets that travel across a local area network. The United States Navy used one of these 
programs, known as the "i Watch" program, in order to wiretap on communications of a 
suspected computer hacker who had been breaking into computer systems at the U.S. 
Department of Defense and NASA. The iWatch program uses a network probe to read all 
packets that travel over a network and then "stores" this information in a common data 
repository. A simple computer program can then be written to read through the stored data, and 
to display only "interesting" information. What may be "interesting" is determined by the 
individual preparing the program and is defined in different ways, e.g., "login names that do not 
belong to the following: {X, Y, Z, . . .}." Whenever an interesting piece of information is found 
within the stored data, the stored data is rescanned and a specific number of characters on both 
sides of the "interesting" piece are reported. These interesting characters are then reviewed in 
order to determine the content of the message and as a guide to future monitoring activity. 

While the i Watch program appears to have been successful in catching at least one 
computer hacker, it has several limitations. Specifically, the decision to perform a surveillance 
session on a particular communication node was performed by an individual. This requires that 
knowledge be conveyed to the individual and that individual make a judgment to proceed with 
the surveillance. Once the decision to perform the surveillance is started, then all of the data 
which flows through the node is collected. In other words, the data collection step is not 
selective. All of the data is collected and stored in a large database for later analysis. Thus, the 
iWatch method is limited by the size of the database used. In order to provide the most 
flexibility, large storage units must be set aside, increasing the cost and complexity of the iWatch 
system. Further, the analysis of the collected data is not performed in red Rather, the 
software program reads through the stored data in order to determine what is "interesting." Thus, 
there is a lag between the time that the data is collected, and the analysis to determine if there are 
communications which should be monitored. This can be a disadvantage since, many times, in 
order to catch a skilled computer hacker, it is necessary to react immediately to the hacker's 
presence. Finally, once the "interesting" data has been identified in the iWatch system, once 
again, an individual operator must make the determination as to where the network probe will be 
placed in the network in order to "tap" the desired communications. The requirements of human 



WO 99/27682 PCT/US98/24355 

-3- 

intervention are thus key steps in the iWatch surveillance system which reduces its efficiency and 
usefulness. 



Summary of the Invention 

According to the present invention, a method and apparatus are provided for 
automatically and intelligently determining when and how to monitor network activity for 
surveillance purposes. 

In a specific embodiment, the system utilizes two reasoning agents which in combination 
carry out the surveillance task. The inputs and outputs of these agents are defined, but there are 
several ways to construct the agents depending on the reasoning model or paradigm selected. 

In one embodiment, a first reasoning agent receives accounting data from the network 
which includes a list of communications data sent over the network for a specified time period. 
The list may include an identification of both the source and destination of the data, and may 
further identify the protocol used and volume of data sent. 

The output of the first reasoning agent (which is provided as an input to the second 
reasoning agent) may include: whom-to-survey, when-to-survey, and a level-of-surveillance. For 
example, whom-to-survey may be expressed as communications either: a) sent from a given 
source; b) delivered to a given destination; or c) sent between a given source and destination. 
When-to-survey may be expressed as a time interval. Level-of-surveillance may take the form 
of: volume (data units in/out); protocol; and/or content. 

Additional inputs to the second reasoning agent include the network topology and 
locations of network probes. The goal of the second reasoning agent is to determine which 
network probes to activate and the instructions needed to set parameters on these network probes 
in order to monitor, filter and provide the communications of interest (as determined by the 
output of the first reasoning agent). 

By separating the tasks performed by the first and second reasoning agents, and 
constructing each agent to enhance the separate tasks, a more efficient method of surveillance is 
achieved. 

For example, in a preferred embodiment, a rule-based reasoning system is used for the 
first reasoning agent, and a constraint-based reasoning system is used for the second reasoning 
agent, as described in greater detail below. 
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Surveillance decisions are thus made automatically rather than having decisions made by 
individuals, and the appropriately programmed tasks analyze the data and implement the 
surveillance. Specifically, the decision points of: 1) whether and whom to tap; 2) what level of 
tapping; 3) where to activate probes in the network; and 4) an interpretation of what is heard, can 
all be automatically accomplished. 

The surveillance system of the present invention can be configured to act as either an 
advisor to a network administrator or configured to work in a folly-automated mode in which 
decisions are made and necessary actions taken without operator intervention. 

The method and apparatus may be implemented in either a router-based or switch-based 
network, or in a hybrid router/switch-based network. 

These and other features and benefits of the present invention will be set forth in the 
following detailed description and drawings which are given by way of example only and are in 
no way restrictive. 

Brief Description of the Figures 

Fig. 1 is a schematic diagram of a network and system incorporating the present 
invention; 

Fig. 2 is a flowchart representing an overview of operations performed in the present 
invention; 

Fig. 3 is a block diagram representation of one embodiment of the present invention; 
Fig. 4 is a flowchart showing the steps performed in the identification reasoning agent; 

and 

Fig. 5 is a flowchart representing the steps performed in the probe control reasoning 

agent 

Detailed Descrip tion 

A first embodiment of the invention will be described for use in a switch-based network. 
A switch-based network includes a plurality of devices, such as workstations, printers, storage 
devices, servers, etc., connected to one another through a plurality of switches. The switches are 
configured so as to direct a message, usually in the form of a data packet, from a source to a 
destination. For example, in the MMAC-Plus® system available from Cabletron Systems, Inc., 
Rochester, New Hampshire, U.S.A., the switches may reside in a common chassis or be 
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distributed amongst more than one chassis. Although a switch-based network is described, one 
of ordinary skill in the art will understand that the present invention can be applied in other types 
of networks. 

As shown schematically in Fig. 1, a switched network 100 includes a plurality of 
switches 1 02 connected to one another, and a plurality of end nodes 1 04 each connected to one or 
more of the switches 102. Data between any two end nodes 104 is sent through at least one 
switch 102. A network management system 106 includes a topology service, coupled to the 
network 100 so as to determine the topology of the network and to monitor other network 
functions. Spectrum®, a network management system available from Cabletron Systems, Inc., 
polls the network 100 on a regular basis in order to determine the status of the switches 102 and 
other network devices 104 and maintains information about the topology of the network and 
about the operations of the network devices. 

A processing unit or CPU 1 08 is connected to the network management system 1 06 to 
receive information regarding the operation of the network 1 00. A memory 1 1 0 and storage 
device 1 12 are connected to the processor 108 to provide temporary and permanent storage, 
respectively, of information required by the processor 108. In one embodiment, processor 108 
may be running VLAN Manager software available from Cabletron Systems, Inc., which enables 
"virtual" LANs to be established between different groups of users and/or applications. A 
display unit 1 14 is connected to the processor 108 so as to display, generally in graphic form, a 
representation of the network including its topology and functions. Through either keyboard 
and/or mouse input devices 1 16a, 1 16b, connected to the processor 108, and through the 
interface program of VLAN Manager, a user can perform various analyses of the network, 
control the configuration of the network, e.g., adding or deleting nodes and/or switches as the 
network changes, and monitor data transmissions, as discussed below in more detail. 

The VLAN Manager is run on a processor capable of supporting at least one of Windows 
NT 3.51, Solaris 2.4 and 2:5.1, HP/UX 10.01 and 10.10, AIX 4.0, and IRIX 5.3 operating 
systems. Any one of a number of commercial or proprietary processors may be used. Generally, 
the CPU platform 108 requires a minimum of sixty-four Megabytes of RAM, 100 Megabytes of 
swap space and 150 Megabytes of available disk drive space. 

If a user wishes to monitor data or communications between, for example, a source node 
104 s and a destination node 104 D in the switched network (see Fig. 1), the user may connect a 
data analyzer or probe 1 1 8 to the network to review the "tapped" data. As disclosed in 



WO 99/27682 PCT/US98/24355 

-6- 

commonly assigned and co-pending U.S. patent application Serial No. 08/790,473, entitled 
"Method and Apparatus to Establish a Tap-Point In a Switched Network Using Self-Configuring 
Switches Having Distributed Configuration Capabilities," by Liessner et al., (hereinafter 
"Liessner") which is herein incorporated by reference in its entirety, a user can plug the probe 
1 1 8 into any switch 102 in the network to which the user has convenient access. Alternatively, a 
tap-point can be established as disclosed in commonly assigned U.S. patent application Serial 
No. 08/370,158 entitled "Use of Multipoint Connection Services to Establish Call-Tapping 
Points in a Switched Network," by Dev et al., (hereinafter "Dev") which is also hereby 
incorporated by reference in its entirety. In either approach, a probe or tap-point can be 
established which either receives specific transmissions within the network or is configured to 
receive all data transmitted by the network. 

The probe 1 1 8 includes a memory 120 and a storage device 122. In the systems 
referenced above, the probe 1 1 8 may be considered just another device in the switched network, 
similar to the workstations, printers, storage devices, servers, etc. In addition, there may be 
multiple probes connected to the switch and/or at other points in the network. As shown, the 
probe 118 communicates with the CPU 108 over interface 119. 

As an overview of the operation of the present invention, a flowchart as shown in Fig. 2 
will be referenced. In step 200, accounting data (AD) is received by the processor 108. The 
accounting data consists of a list of communications over the network for some specified time 
period. The list may consist of source/destination pairs or may consist of further information 
such as the communications protocol used and volume of communications for each pair. As the 
accounting data is received, in step 202, the data is analyzed. 

In the present invention, at step 204, traffic on the network which merits further attention 
is identified. This identification is accomplished automatically and in real-time by the 
application of reasoning paradigms, e.g., rule-based reasoning, case-based reasoning, constraint- 
based reasoning, fuzzy logic or neural net analysis. Additional discussion of these and other 
reasoning paradigm's can be found in Artificial Intelligence: A Modern Ap proach by Stuart 
Russell and Peter Norvig, Prentice Hall, New Jersey, 1995. By application of any one or more of 
these reasoning approaches, any traffic on the network which is "suspect" or which requires 
further analysis is automatically identified. The parameters which define "suspect" traffic or 
transmissions within the network are set within the reasoning system, as discussed below in more 
detail. 
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Once network traffic or data to be tapped or monitored is identified in step 204, the 
network probe or probes, and/or network switch or switches, are configured in order to collect 
the data identified in step 206. The identification of the probes and/or switches to be used and/or 
configured is determined from an analysis of the topology of the network in combination with 
the system being used for setting up a tap which, as above, can be either the Liessner or Dev 
systems referenced above. The determination as to how to configure the probe and/or switches is 
also based upon an application of reasoning approaches which were discussed with regard to step 
204. Of course, the criteria for determining which switches and probes to use in order to tap into 
a given connection in the switched network differs from those used in establishing the criteria for 
identifying the traffic to be monitored in step 204. Once the probe and switches have been 
configured, in step 208, the identified traffic is "tapped" and stored for analysis. In this manner, 
the occurrence of network traffic which merits further attention can be automatically identified 
without the intervention of an operator and thus accomplished in real-time. 

As used in this specification, "real-time" is a matter of degree and not a true/false 
absolute. Real-time in the short term involves reasoning about those tasks that require close to 
instantaneous action, with minimal time to think about options, plans, strategies, etc. Real-time 
in the long term involves reasoning about tasks for which there is time to think about options, 
plans, etc., i.e., tasks for which action is not urgent. 

Within the processing unit 108, the functions as disclosed in steps 202 and 204 are 
accomplished within an Identification Reasoning (IR) agent 300 as shown in Fig. 3. The IR 
agent 300 can be implemented as a software program operating within the processing unit 108. 
The operation of configuring network probes and/or network switches in order to tap identified 
traffic as per step 206 is performed within a Probe Control Reasoning (PCR) agent 302, which is 
coupled to the IR agent 300. Similar to the IR agent 300, the PCR agent 302 is a software 
program which operates on the output from the IR agent 300. 

As shown in Fig. 3, the IR agent 300 receives accounting data 304 as an input along with 
information reasoning (IR) parameters 306. The IR parameters 306 are determined by an 
operator and are the criteria used by the IR agent 300 in order to identify network traffic or data 
which merits further attention. The IR parameters 306 include, but are not limited to, particular 
user names, logical source or destination addresses, physical source or destination addresses, 
traffic volume thresholds which when exceeded may cause further analysis, communications 
from or to particular nodes in the network, communications between particular nodes (the classic 
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"wire-tap"), and communications routed through a particular switch or switches in the network. 
While nodes are being represented in the preferred embodiment, the present invention would also 
be applicable to monitoring data communication from/to particular sources or destinations no 
matter the node at which the source or destination is located since the probe can identify a packet 
by its source or destination address. The accounting data 304 may include, but once again is not 
limited to, communications over the network for a specified time period. This information may 
also include source/destination pairs or may consist of further information such as 
communications protocol and volume of communications for each pair. 

The IR agent 300 monitors traffic in real-time or in a database and is triggered by 
abnormal events. As an example, the IR agent 300 might simply look at all "spikes" or sudden 
increases in a parameter and review the sources and destinations of the message units that caused 
the spike. As a further example, when all traffic data for a particular period of time has been 
downloaded to an accounting database, for example, the IR agent 300 might be programmed to 
look for instances of links with exceedingly high volume. Those links that exceed a 
predetermined threshold would then be chosen for further investigation. 

The IR agent 300 applies the IR parameters 306 to the accounting data 304 in order to 
provide a three part output. Output decision data 307 includes information regarding: 1) who to 
survey; 2) when to survey; and 3) a level of surveillance. The indication of who to survey could 
include, but is not limited to, all communications delivered from a given source, all 
communications delivered to a given destination, or all communications between a given source 
and destination. The level of surveillance may indicate collection of, for example, the volume of 
communication, expressed in data units in or out; the protocol being used by the particular 
message; and/or the contents of the communication, i.e., the message. 

The PCR agent 302 receives the who, when and level information from the IR agent 300. 
The PCR agent 302 also receives probe control reasoning parameters 308 and network topology 
information 310. The PGR agent 302 automatically applies the network topology information 
and the reasoning parameters in order to determine probe control output information 312 to 
configure the probes and switches in order to carry out the monitoring of data as per the output 
from the IR agent 300. 

The probe control output information 312 coming from the PCR agent 302 is in a form 
such that the network management system 1 06 is able to configure the switches so as to 
accomplish the tap. Accordingly, the PCR agent 302 would include information regarding, for 
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example, either the Liessner method and apparatus, or the Dev multipoint connection service, so 
that commands can be executed. The PCR agent 302 stores the format structures for a multitude 
of different networks and/or switching protocols. The network topology information 310 would 
then include an indication as to the type of network so that the PCR agent 302 could format its 
probe control information 312 accordingly. Further, a universal standard could be established 
whereby the probe control information 312 is in a standard format which is not specific to any 
particular vendor's network management platform. Any network management platform which 
conforms to the standard would receive this standardized probe control information and translate 
it so that the tapping connections could be established. In this manner, as new network 
management platforms become available, the PCR agent need not be updated since its output is 
of a form that any new network management platform (which complies with the standard) can 
understand. 

Operations within the IR agent 300 will now be discussed in more detail with regard to 
the flowchart shown in Fig. 4. In step 400, the reasoning parameters are programmed into the IR 
agent 300. In a preferred embodiment, a rule-based reasoning system has been used in the IR 
agent 300. 

In step 402, the accounting data, as described above, is received by the IR agent 300. The 
reasoning parameters, according to the rule-based reasoning system, are applied to the received 
accounting data in step 404. In step 406, the who, when and level results, which are the results 
of the application of the reasoning parameters to the accounting data, are output. As long as 
accounting data is received in step 402, steps 404 and 406 are executed. Of course, if necessary, 
step 400 can be executed when the rules of the rule-based reasoning system need to be changed 
or updated. 

A rule-based reasoning system was chosen for the information reasoning agent since it is 
relatively easier to understand than case-based reasoning, fuzzy logic, neural networks or other 
reasoning paradigms. Further, and more importantly, since the monitoring of a network can be 
expensive, a reasoning paradigm that operates in close to real-time and uses minimal CPU cycles 
is desirable. A one-ply rule-based system satisfies this requirement since it functions in a 
manner similar to a look-up table. There are, however, disadvantages associated with a rule- 
based system since it cannot learn and evolve as the usage of the network evolves. This 
represents a trade-off between thoroughness and speed. Certainly, depending upon the resources 
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available and desired thoroughness of analysis, other reasoning systems can be used rather than a 
rule-based system. 

The rules which determine how to identify network communications which are to be 
monitored are established in the IR agent 300. Merely as examples as to how the rules may 
function, the following scenarios are provided: 

Scenario 1 : the network in question is proprietary and all of the users and agents send 
short and to-the-point messages. 

Rule for scenario 1 : if any packet is more than X bytes long, then the source of the packet 
is suspect. 

Scenario 2: the network is proprietary, and agents always send messages of protocol type 

Y. 

Rule for scenario 2: if any packet is not of type Y, then the source and destination of the 
packet are suspects. 

Scenario 3: the network is proprietary and it is known that server S should never receive 
any messages, in other words, there should be no attempts to log onto this server S. 

Rule for scenario 3: if any packets have a destination S, then the source of the packet is 
suspect. 



The PCR agent 302 is programmed with the reasoning parameters in step 500 as shown 
in Fig. 5. A constraint-based reasoning system has been chosen in the preferred embodiment for 
the PCR agent 302. Constraint-based reasoning was chosen because, at this stage of the 
surveillance task, the required analysis becomes more complex. The constraints imposed on the 
PCR agent 302 are the who to survey, when to survey, level of surveillance information, and the 
network topology information 310 which includes the locations of any available probes. 

A goal of constraint-based reasoning is to satisfy as many of the constraints as possible. 
As an example, the level of surveillance might have to be down-graded from actual content to 
data units in/out in order to satisfy all the other constraints. Alternatively, the who of 
surveillance might have to be down-graded from source and destination to only source. In 
general, there will be several ways to satisfy some, but not all, of the constraints. 
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As an example, one of the controls in the case-based reasoning system may require that 
given a choice between down-grading the level of surveillance or who to survey, always down 
grade the who to survey, setting. It should be noted that the who to survey, when to survey and 
level of surveillance are "soft-constraints. " The placement of probes, however, is typically a 
"hard-constraint" and the network topology is an even harder constraint. 

Once the constraint parameters of the PCR agent 302 are established, the network 
topology data is received in step 502. The PCR agent 302 is constantly updated with the network 
topology data so that its perception of the network is accurate. As is known, the topology of a 
network is dynamic and may change over time. The PCR agent 302 must have information 
about the topology of the network in order to make proper connections when attempting to tap 
into communications in the network. In step 504, the who, when and level data are received 
from the IR agent 300. The constraint-based reasoning algorithms are applied to the network 
topology data and the data received from the IR agent 300 in step 506. The output from the PCR 
agent 302, i.e., the probe control data 312, is determined and output in step 508. 

This probe control data is used to control the configuration of switches and probes in the 
network so that the desired data can be monitored. Control then returns to step 502, the receipt 
of the network topology data, and steps 504, 506, 508 are repeated. The network topology data 
is constantly received so that existing taps are maintained in the event that the topology of the 
network changes. In other words, if there is a change to the topology which disrupts the tapping 
of particular network communications, the PCR agent 302 will respond to the topology change 
so as to maintain the tapping of the data. This may involve rerouting communications to a probe, 
using a different probe, or reporting that a tap can no longer be maintained because of a change 
in the topology of the switching system. 

The two reasoning agents 300, 302 in combination carry out the surveillance task. The 
inputs and the outputs of these agents have been determined, but one of ordinary skill in the art 
can see that there are several ways to construct the reasoning agents depending on the reasoning 
paradigm utilized. Thus, for a preferred embodiment, a rule-based reasoning system was 
selected for the IR agent 300 and a constraint-based reasoning system was chosen for the PCR 
agent 302, however, it is clear that different reasoning systems may be chosen, respectively, for 
the agents. 

Although the present embodiment is disclosed within the operation of a switch-based 
network, it is clear that the invention also applies to router-based networks and hybrid 
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router/switch-based networks. Further, as is known, many kinds of network probes are 
commercially available. No assumptions nor restrictions about vendor-specific probes have been 
made. An example of a commonly available probe is the Intelligent RMGN/RMON2 Enterprise 
Probe available from Frontier Software Development, Inc., Chelmsford, MA, USA. This 
Enterprise Probe uses the RMON standard to provide diagnostic operations for complex network 
configurations. 

Having thus described an embodiment of the present invention, various modifications and 
improvements will occur to those skilled in the art which are intended to be part of this 
disclosure and within the scope of the invention. Accordingly, the foregoing description is by 
way of example only and is not intended as limiting. 
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1 . A method of monitoring data transmitted between at least two nodes in a network, 
the method comprising steps of: 

(a) receiving, in real-time, data transmitted in the network; 

(b) analyzing, in real-time, the retrieved data to identify particular data to be 
monitored; 

(c) monitoring, in real-time, the identified particular data in the network; and 

(d) storing the monitored particular data in a storage device. 

2. The method as recited in claim 1 , wherein step (b) comprises a step of: 
applying a reasoning operation to the received data to identify the particular data. 

3. The method as recited in claim 2, wherein the reasoning operation is a rule-based 
operation. 

4. The method as recited in claim 1 , wherein the received data comprises 
identification of a source of the retrieved data and identification of a destination of the 
retrieved data. 

5 . The method as recited in claim 4, wherein the received data further comprises: 
at least one of a protocol and a volume of data associated with the source and 

destination. 

6. The method as recited in claim 4, wherein step (b) comprises steps of: 
applying a rule-based operation to the received data; and 

identifying at least one node for performing the monitoring of the particular data, 
a time period for the monitoring, and a level of the monitoring. 

7. The method as recited in claim 6, further comprising at least one step of: 
monitoring data delivered to the at least one identified node; 
monitoring data sent from the at least one identified node; and 
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monitoring data sent between the at least one identified node and another node in 
the network. 



8. The method as recited in claim 6, wherein the level of monitoring comprises at 
5 least one of: 

counting a number of data units; 
determining a type of protocol used; and 
determining a content of the particular data. 

1° 9. An apparatus for monitoring data transmitted between at least two nodes in a 

network, the apparatus comprising: 

means for receiving, in real-time, data transmitted in the network; 
means, connected to the receiving means, for analyzing, in real-time, the received 
data and for identifying particular data for monitoring; 
15 means, connected to the analyzing and identifying means, for monitoring the 

identified particular data in the network; and 

means for storing the monitored particular data. 

10. The apparatus as recited in claim 9, wherein the analyzing and identifying means 
20 comprise: 

means for applying a rule-based reasoning operation to the retrieved data to 
identify the particular data. 

1 1 . The apparatus as recited in claim 1 0, wherein the monitoring means comprise: 

25 means for applying a constraint-based reasoning operation to monitored particular 

data. 



30 



12. The apparatus as recited in claim 1 0, wherein the received data comprises 
identification of a source of the retrieved data and identification of a destination of the 
retrieved data. 
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13. The apparatus as recited in claim 12, wherein the received data further comprises 
at least one of a protocol and volume of data associated with the source and destination. 

14. The apparatus as recited in claim 1 1 i wherein the means for analyzing determines 
at least one of: 

at least one node in the network to perform the monitoring; 
a time period during which the monitoring is to occur; and 
a level of the monitoring. 

15. The apparatus as recited in claim 9, wherein the means for analyzing determines 
at least one of; 

a specific node whose output data is to be monitored; 
a specific node where all data directed to it is to be monitored; and 
a specific source node and a specific destination node wherein all data between 
the specific source and destination nodes is to be monitored. 

16. An apparatus for monitoring data communications in a network, the apparatus 
comprising: 

a first reasoning agent, having a first input to receive accounting data from the 
network and a second input to receive first reasoning parameters, for generating and 
outputting identification data by applying the first reasoning parameters to the accounting 
data according to a first reasoning operation; and 

a second reasoning agent, having a third input to receive the identification data 
from the first reasoning agent, a fourth input to receive second reasoning parameters and 
a fifth input to receive network topology data, for generating and outputting probe control 
data by applying the second reasoning parameters to the identification data and the 
network topology data according to a second reasoning operation. 

1 7. The apparatus according to claim 1 6, wherein the identification data comprises at 
least one of: 

data identifying at least one node in the network to monitor; 
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data identifying a time period during which monitoring of the at least one 
identified node is to occur, and 

data indicating a level of the monitoring. 

1 8. The apparatus according to claim 16, wherein the probe control data comprises: 
network switch configuration data. 

19. The apparatus according to claim 1, wherein the first reasoning operation is a rule- 
based operation and the second reasoning operation is a constraint-based operation. 

20. The apparatus according to claim 16, wherein each of the first and second 
reasoning agents comprises: 

a processing unit; and 

a memory unit coupled to the processing unit, the memory unit storing a program 
according to the respective reasoning operation. 

21. An apparatus for monitoring data communications in a network, the apparatus 
comprising: 

a first reasoning agent for identifying data communications within the network to 
be monitored; and 

a second reasoning agent, coupled to the first reasoning agent, for configuring at 
least one switch within the network to achieve the monitoring of the identified data 
communication. 



22. The apparatus as recited in claim 21, wherein: 

the first reasoning agent receives accounting data from the network and outputs 
identification data by applying a first reasoning operation. 

23. The apparatus as recited in claim 22, wherein: 

the second reasoning agent receives the identification data from the first reasoning 
agent and outputs control data by applying a second reasoning operation. 
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24. The apparatus according to claim 23, wherein the identification data comprises at 
least one of: 

data identifying at least one node in the network to monitor; 
data identifying a time period during which monitoring of the at least one 
identified node is to occur; and 

data indicating a level of the monitoring. 

25. The apparatus according to claim 23, wherein the probe control data comprises: 
network switch configuration data. 

26. The apparatus according to claim 23, wherein the first reasoning operation is a 
rule-based operation and the second reasoning operation is a constraint-based operation. 

27. The apparatus according to claim 23, wherein each of the first and second 
reasoning agents comprises: 

a processing unit; and 

a memory unit coupled to the processing unit, the memory unit storing a program 
according to the respective reasoning operation. 
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MULTIPLE PRIORITY BUFFERING IN A COMPUTER NETWORK 

Field of the Invention 

The invention relates to communication networks and, more particularly, to buffering 
5 received and/or transmitted communication units in a communications network. 

Discussion of the Related Art 

Communication networks have proliferated to enable sharing of resources over a 
computer network and to enable communications between facilities. A tremendous variety of 

10 networks have developed. They may be formed using a variety of different inter-connection 
elements, such as unshielded twisted pair cables, shield twisted pair cables, shielded cable, 
fiber optic cable, even wireless inter-connect elements and others. The configuration of these 
inter-connection elements, and the interfaces for accessing the communication medium, may 
follow one or more of many topologies (such as star, ring or bus). A variety of different 

15 protocols for accessing networking medium have also evolved. 

A communication network may include a variety of devices (or "switches") for 
directing traffic across the network. One form of communication network using switches is 
an Asynchronous Transfer Mode (ATM) network. These networks route "cells" of 
communication information across the network. (While the invention may be discussed in 

20 the context of ATM networks and cells, this is not intended as limiting.) 

FIG. 1 is a block diagram of one embodiment of a network switch 10. In this 
particular example, the network switch has three input ports 14a- 14c and three output ports 
14d-14f. The switch is a unidirectional switch, i.e., data flows only in one direction - from 
ports 14a- 14c to ports 14d-14f. A communication unit (such as an ATM cell, data packet or 

25 the like) may be received on one of the ports (e.g., port 1 4a) and transmitted to any of the 
output ports (e.g., port 14e). The selection of which output port the communication unit 
should receive the communication unit may depend on the ultimate destination of the 
communication unit (and may also depend on the source of the communication unit, in some 
networks). 

30 Control units 1 6a- 1 6c route communication units received on the input ports 1 4a- 1 4c 

through a switch fabric 12 to the applicable output ports 14d-14f. For example, a 
communication unit may be received on port 14a. The control unit 16a may route the 
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communication unit (based, for example, on a destination address contained in the 
communication unit) through the switch fabric 12 to the buffer 1 6e. From there, the 
communication unit is output on port 14e. 

The buffers 16d-16f permit the network switch 10 to reconcile varying rates of 
5 receiving cells. For example, if a number of cells are received on the various ports 1 4a -1 4c, 
all for the same output port 14d, the output port 14d may not be able to transmit the 
communication units as quickly as they are received. Accordingly, these units may be 
buffered. 

A great number of variations on the network switch 10 illustrated in FIG. 1 are 

10 possible. For example, control unit 1 6a- 16c may be done in a centralized manner. As another 
example, the buffer in 16d-16f may be done on the input ports (e!g., as part of control units 
16a- 16c), rather than for the output ports. Another possibility is to use a combined buffer 
for input and output. This may correspond to pairing an input port with an output port. For 
example, input port 14a could be paired with output 14d, for the effect of a bi-directional port. 

15 FIG. 2 illustrates buffering using separate receive and transmit buffers at the same 

time. In this example, network port 24 includes both an input port (e.g., port 25a) and an 
output port (e.g., 25d). A buffer 26 is provided for the input port. A separate buffer 28 is 
provided for the output port. Information may be routed through the network switch fabric 22 
between ports, as generally described above. 

20 FIG. 3 illustrates an alternative embodiment. In this embodiment, combined receive 

and transmit buffers are shown. In this embodiment, the receive buffer 36 and transmit buffer 
are stored in a common memory 35. 

Another alternative would be to provide a receive buffer and a transmit buffer that 
include a shared memory area. Such a system is described in copending and commonly 

25 owned United States Patent Application Serial No. 08/847,344, entitled Method And 

Apparatus For Adaptive Port Buffering, filed April 24, 1 997, by Steve Augusta et al., which is 
hereby incorporated by reference in its entirety. 

In many networks, all communication units are treated equally - i.e., all 
communication units are assumed to have the same priority in traveling across a network. 

30 Alternatively, various levels of quality of service ("QoS") may be provided. This has been 
applied in ATM networks, although the concept may be applied in other contexts. 
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In one example, different services offered over the network may have different 
transmission requirements. For example, video on demand may require high quality service 
(to avoid jerking movement in the video), while e-mail allows a lower quality of service. 
Subscribers may be offered the option to pay higher prices for higher levels of quality of 
5 service. 

Summary of the Invention 

According to one embodiment of the present invention, a buffer element for a 
communication network is disclosed. A first buffer memory is provided to store 

10 communication units corresponding to a first quality of service (QoS) level. A second buffer 
memory stores communication units corresponding to a second quality of service level. A 
buffer manager is coupled to the first buffer memory and the second buffer memory. A depth 
adjuster may be provided to adjust corresponding depths of the first buffer memory and the 
second buffer memory. 

15 According to another embodiment of the present invention, a switch for a 

communication network is disclosed. The switch includes a plurality of ports, a first buffer 
memory coupled to one of the ports to store communication units corresponding to a first 
quality of service level and a second buffer memory coupled to the one of the ports to store 
communication units corresponding to a second quality of service level. 

20 According to another embodiment of the present invention, a method of buffering 

communication units in a communication network is disclosed. According to this 
embodiment, a queue depth is assigned for each of a plurality of queues, each queue being 
designated to store communication units of a predetermined quality of service level. The 
plurality of queues is provided, each having the corresponding assigned depth. One of the 

25 queues is selected to receive a communication unit, based on a quality of service level 

associated with the communication unit. The communication unit may then be stored in the 
selected queue. This embodiment may further comprise a step of adjusting queue depths. 

* According to another embodiment of the present invention, a method of selecting a 
communication unit for transmission in a communication network that provides a plurality of 

30 quality of service levels is disclosed. In this embodiment, the communication unit is selected 
from a plurality of communication units stored in a buffer, the buffer including a plurality of 
queues, each queue corresponding to one of the quality of service levels. The method of this 
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embodiment includes the steps of identifying the queue with the highest corresponding quality 
of service level and which is not empty, and then selecting the communication unit from the 
identified queue. 

According to another embodiment of the present invention, a method of storing a 
5 communication unit in a buffer is disclosed. According to this embodiment, the 

communication unit has one of a plurality of quality of service levels and the buffer includes a 
plurality of queues, each queue corresponding to one of the quality of service levels. 
According to this embodiment, the method comprises steps of determining the quality of 
service level of the communication unit and storing the communication unit in the queue 
10 having the corresponding quality of service level of the communication unit. According to 
this embodiment, the communication unit may be dropped when the queue having the 
corresponding quality of service level of the communication unit is full (or alternatively 
placed in a queue for a lower quality service). 

15 Brief Description of the Drawings 

FIG. 1 illustrates one embodiment of a network switch in a communication network. 

FIG. 2 illustrates one embodiment of buffering for a switch. 

FIG. 3 illustrates another embodiment of buffering for a switch. 

FIG. 4 illustrates one embodiment of a buffer element according to the present 
20 invention. 

FIG. 5 illustrates one embodiment of a network switch according to the present 
invention. 

FIG. 6 illustrates one embodiment of a method for receiving cells using the buffering 
element illustrated in FIG. 4. 
25 FIG. 7 illustrates one embodiment of retrieving cells from a buffer element such as 

that shown in FIG. 4. 

FIG. 8 illustrates one embodiment of a method for determining depth assignments for 
a buffering element. 

FIG. 9 illustrates one embodiment of a graphical user interface for inputting queue 
30 depth assignment problems. 

FIG. 10 illustrates one embodiment of a buffer element and associated controllers for 
use in a communication network. 



WO 99/57858 PCT/US99/09853 

-5- 

FIG. 1 1 illustrates one embodiment of a method for adjusting queue depths during use 
of the communication network. 

Detailed Description 

5 Design of a communication network (or a switch for use in a communication network) 

that supports various levels of QoS can be a difficult task. One difficulty is determining the 
quality of a particular implementation. Generally, the design of a communication network 
may pursue the following (sometimes conflicting) goals: 1) Accommodating traffic through 
the network; 2) Making efficient use of the network facilities; 3) Ensuring that network 
10 performance reflects the appropriate QoS levels. 

Two potential measures of the quality of service offered include cell loss rate (CLR) 
and cell transfer delay (CTD). CLR reflects the number of cells that are lost. For example, if 
more cells arrive at a switch than can be accommodated in the switch's buffer, some cells may 
be lost. 

1 5 CTD corresponds to the amount of time a cell spends at a switch (or other storage 

and/or transfer device) before being transmitted. For example, if a cell sits in a buffer for a 
long period of time while other (e.g., higher QoS level) cells are transmitted, the CTD of the 
delayed cell is the amount of time it spends in the buffer. 

In the embodiment described below, mean cell loss rate (CLR) and mean cell transfer 

20 delay (CTD) are used to measure the quality of service. Of course a number of variations on 
these measures as well as other measures could be used. For example, cell delay variation 
(the amount of variation in cell delay) or maximum CTD (rather than average CTD) could be 
used as alternative or additional measures, Other measures may be used instead or as well. 
FIG. 4 illustrates one embodiment of a buffer element for use in a network 

25 accommodating multiple QoS levels. A buffering mechanism 40 is provided at a switch port, 
such as the buffering element 16d at port 14d of FIG. 1 . In that particular example, the 
buffering occurs at an output port 14d. In alternative embodiments, buffering may be 
associated with an input port (e.g., 14a-14c of FIG. 1) or both input and output ports. 

In the example of FIG. 4, the buffering element 40 includes four queues (also referred 

30 to as buffers) 43a-43d. Each queue is composed of a storage component, such as a random 
access memory (or any other storage device). Each queue 43a-43d is associated with a 
particular QoS level for the network. Thus, in the example of FIG. 4, there are four QoS 
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levels. Queue 1 (43a) corresponds to the highest QoS level. Queue 2 (43b) corresponds to the 

second highest QoS level. Queue 3 (43c) corresponds to the third highest QoS level. Queue 4 

(43d) corresponds to the lowest QoS level. 

Each of the queues 43a-43d also has an associated depth. The depth corresponds to 
5 the amount of information that can be stored in the particular queue. Where incoming cells 41 

have a fixed length, the depth of the queue may be measured by the number of cells that can 

be stored in that queue. 

In Fig. 4, queue 1 (43a) has a depth Dl . Queue 2 (43b) has a depth D2. Queue 3 (43c) 

has a depth D3. Queue 4 (43d) has a depth D4. Each of the depths Dl -D4 may be of a 
10 different size. When incoming cells 41 are directed to the port, a sorter 44 assigns the cell to 

the appropriate queue 43a-43d based on the QoS of the cell. In most cases, the QoS of the cell 

will be indicated in an information field within the cell itself. 

When a cell can be transmitted from the port, a merge unit 45 selects the appropriate 

cell for transmission. While the sorter 44 and merge unit 45 are shown as separate 
15 components, these may be implemented in a number of ways. For example, the sorter and 

merge unit may be separate hardware components. In another embodiment, the sorter 44 and 

merge unit 45 may be programmed on a general purpose computer coupled to the memory or 

memories storing queues 43a-43d. In another embodiment, a common merge unit is used for 

all of the ports (particularly where buffering is done on an input port). 
20 The queues 43a-43d may be implemented using separate memories. In the alternative, 

the queues may be implemented in a single memory unit, or shared across multiple shared 

memory units. The memory units may be conventional random access memory device or any 

other storage element, such as shift registers or other devices. 

FIG. 5 illustrates one embodiment of a switch 50 that includes buffering elements 
25 53a, 53b, 54a, 54b, 55a, 55b, 56a and 56b, similar to those illustrated in FIG. 4. The 

embodiment of FIG. 5 has four input ports 51a-51d and four output ports 52a-52d (and hence 

is a 4X4 switch ). 

In the example of FIG. 5, there are only two QoS levels. In this example, each output 
port 52a-52d has two associated queues (one for each QoS level). For example, output port 
30 52a has two associated queues 53a and 53b. Again, while this embodiment illustrates 

buffering on the output ports, buffering could instead be done on the input ports or on both 
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input and output ports. In addition, while FIG. 5 illustrates queues 53a-56b as separate 
devices, they may be stored in one, or across several, memory chips or other devices, 

FIG. 6 illustrates one embodiment of a process for receiving cells at a buffering 
element, such as receiving incoming cells 41 at buffering element 40 of FIG. 4. The process 
5 begins at a step 60 when a cell is received. At a step 6 1 , the appropriate QoS level for the cell 
is determined. This may be done, for example, by examining a field in the cell that specifies 
or otherwise indicates the QoS level. 

At a step 62, it is determined whether there is room in the appropriate QoS buffer to 
receive the cell. If so, the cell is stored in the buffer, at a step 63. If there is no room in the 
10 appropriate QoS buffer, the cell is dropped at a step 64. 

Of course, a number of variations on this process may be developed. As just one 
example, if there is no room in the appropriate QoS buffer (step 62), buffers of a lower 
priority could be examined. If there is room in a lower priority buffer, the cell could be stored 
in that buffer (additional steps may be taken when order of cell transmission is important, 
15 such as taking cells from the queue out of FIFO order). In any event, a number of variations 
and optimization may be made to the embodiment of FIG. 6. 

FIG. 7 illustrates one embodiment of a method for retrieving cells stored in a buffering 
element, such as selecting the outgoing cells 42 of FIG. 4. 

In this particular embodiment, the top level queue is selected first (e.g., queue 43a of 
20 FIG. 4), at a step 70. 

At a step 71, it is determined whether the selected queue is empty. If so, the next 
queue is selected (at a step 73), and examined to determine if it is empty (step 71). 

Once a queue that is not empty has been found, one (or more) cell from that queue is 
transmitted at a step 72. In this particular embodiment, after a cell has been transmitted, the 
25 top level queue is again examined. Accordingly, the effect of the embodiment in FIG. 7 is to 
transmit cells from the highest level queue that is holding cells, until there are none left. 

A number of variations or alternatives are possible. For example, in the embodiment 
of FIG. 7, a cell in the lowest QoS level queue could be indefinitely frozen from transmission 
by a long stream of cells arriving for higher level QoS queues. An alternative, therefore, 
30 would be to rotate priority among the QoS levels (e.g., give the highest level QoS queue first 
priority sixty percent of the time, the second highest level priority thirty percent of the time, 
the third highest level priority ten percent of the time and the lowest QoS level priority none 
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of the time). Another alternative would be to monitor cell delay and require transmission of 
cells after a certain delay (the delay potentially depending on the QoS level). For example, 
queue 3 could be given highest priority when cells have been sitting in that queue for longer 
than a first period of time, and queue 4 given highest priority when cells have been sitting in 

5 that queue for a second period of time (in most cases, the period of time for the lower QoS 
levels will be greater than the period of time for the higher QoS levels). Again, a number of 
variations and optimizations are possible. 

In the embodiment of FIG. 7, cells are removed from the queue on a first in and first 
out ("FIFO") basis. Again, a number of alternatives are possible. For example, if a cell is in 

1 0 the highest QoS level queue, but can not be transmitted, another cell may be selected from the 
highest QoS level queue (or, in the alternative, a cell selected from the next QoS level queue). 
A cell may not be capable of transmission when, for example, the place to which it is being 
transmitted is blocked. One example of this situation occurs when the buffers appear at the 
input ports (e.g., port 1 4 a of FIG. 1 ). If another port is transmitting a cell to a particular 

15 output port (e.g., port 14d), no other cell stored at any other input port can be transmitted to 
that same port at the same time. Thus, a cell in the highest QoS level associated with port 14a 
might be blocked from transmission to port 14d by another cell being transmitted to that port. 

Referring again to FIG. 4, the buffering element has M queues, where M stands for the 
number of levels of QoS accommodated by the switch. In the example of FIG. 4, M equals 4. 

20 Referring again to FIG. 5, an Nhy N switch is disclosed (in FIG. 5, 7V=4). Where 

buffers appear only on the output (or input), there may be a total oiMxN queues in the 
switch. 

In one embodiment of the present invention, each of the queues may have a different 
depth. That is, the size of each queue may not be the same. In these embodiments, therefore, 
25 a problem may be posed of how much memory to provide for each queue, to meet system (and 
QoS) requirements. This may be referred to as a queue depth assignment problem. 

In one embodiment, the assignment of depths to each of the queues is based on 
performance and characteristic of the network and switch. The depth assignments should 
satisfy the following equation: 



30 
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Where m is the total memory available in the switch, D, y is the depth of the queue at port / and 
QoS level is j. Thus, the sum of the depths of all of the queues has to be less than or equal to 
the total memory (w) available in the switch. As can be seen from this model, the depth of all 
of the highest quality level queues within the switch may, but need not, be the same. For 
example, referring again to FIG. 1 , more memory could be provided for the highest level 
queuing associated with port 14d than with port 14e. 

One way to determine queue depth is to ascertain a mathematical model for the quality 
of the queue depth assignments. The mathematical model can then be solved or used to 
evaluate possible solutions of the depth assignment problem. 

In the following example, an energy function is defined to reflect the measure of the 
quality of the potential solution of the depth assignment problem. In this example, the lower 
the energy function, the better the solution. The energy function is: 

N M 

£ = EE p^lpr pJ \ + p 2J f 2 (D ir Pif k 9 ), 

jP v is the constant penalty imposed for a dropped cell on QoS j. (For example, with three QoS 
levels, weights 10, 5 and 1 could be respectively assigned as the penalty for dropping a cell of 
the corresponding QoS level.) 

P 2J is the penalty imposed for a cell waiting on QoS j\ (For example, with three QoS 
levels, penalties of 8, 4 and 0 could be assigned for each unit time delay of a cell having the 
corresponding QoS level.) 

P 0 is the load on port i, QoSy, which is given by p ;> = Xy/\i r Here, k & is the arrival rate, 
in packets/sec, on port /, QoS j 9 and \ij is the processing rate of QoS j 9 also in packets/sec. 

The function (D, p) is the cell loss probability. Therefore, f x (D, p) X i} corresponds to 
the CLR. The function f 2 (D, p, A.) corresponds to the CTD. 

To use the above energy function, the particular variables of the equation have to be 
filled in. Values of X 0 may be determined by observing the traffic over the switch for some 
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length of time and averaging arrival rates on each queue. Of course, other methods are 
possible. 

The processing rates \x of each queue may be determined by the switch's performance 

characteristics (or observed). 
5 The penalty parameter arrays P t and P 2 may be determined subjectively by the user. 

These values represent the relative importance of minimizing each of the objective measures 

fl and f2 (e.g., CLR and CTD) for each queue. For example, if P, = (1 0, 5, 2, 0), then a 

penalty often is imposed for a lost cell on the first QoS level, a penalty of five on the second 

QoS level, a penalty of two on the third QoS level, and no penalty on the fourth QoS level. In 
10 this example, performance on the fourth QoS level will be sacrificed to improve CLRs of the 

other QoS levels. Similarly, the penalty associated with cell delay P 2 needs to be specified for 

each of the QoS levels. 

The M/M/l/K queuing model may be used to predict CLR and CTD. This model is 

discussed, for example in Kleinrock, L., Queuing Systems, Vol J: Theory, New York, NY: 
1 5 John Wiley & Sons, Inc., 1 975, pp. 1 03-5; and Fu, L., Neural Networks in Computer 

Intelligence, New York, NY: McGraw-Hill, Inc., 1994, pp. 41-5. This model assumes that 

p < 1 s where p is the load. The cell loss probability, f l9 is given by 

l-p DM 

and the CTD is given by 

(»-P D+, )C1-P)A(!-/ 1 (D,p)) 

(A variety of other models may also be used to predict CLR and CTD. CLR and CTD may 
20 also be estimated by taking actual measurements on a system while it is performing.) 

One possible approach to solving for minimum E is to examine all possible depth 
assignments. As is typical of combinatorial problems of this nature, however, the cost of 
exhaustive search grows factorially. The number of feasible solutions is equal to 



25 



(ffl-1)! ( m-1 \ 

(m -NM)\(NM-\)\ { NM-\ ) ' 
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Table 1 below illustrates a few examples to show the growth of this function. 



Table 1. 
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Under certain embodiments of the present invention, alternative methods may be used 
15 to find optimal (or, hopefully, close to optimal) solutions. Thus, neural -networks, genetic 
algorithms and other approaches may be used. 

In one embodiment of the present invention, a straightforward genetic algorithm is 
used to solve the above energy function. According to this method, an initial solution is 
started with. This initial solution can be any random solution, or may be selected intelligently 
20 as discussed below. 

The genetic algorithm then uses a mutation operator that may consist of picking a 
random port, subtracting a random number from a randomly selected queue on that port and 
adding that same number to another randomly selected queue depth on the same port. Simple 
single point cross over may be used to combine solutions. In each generation of the genetic 
25 algorithm, an elite percentage of the population is preserved and used to reproduce the 

remainder of the population using cross over. Half of the offspring may further be mutated a 
number of times. 

In an alternative embodiment, steepest ascent (or descent they are the same) hill- 
climbing (SAHC) may be used. This algorithm (in certain environments) may produce 
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similar results to that of the genetic algorithm, although in considerably shorter time in certain 
applications. 

Using steepest descent hill-climbing, a local minimum solution can be found by 
following the steepest path down the energy surface - following search paths that provide the 
5 greatest decreases in the energy function. 

The steepest descent hill-climbing approach may be modified to include random 
jumps. This would permit the algorithm to jump over small "hills" on the energy function 
surface. This process employs the technique called simulated annealing, known in the art. 
The hill-climbing may be achieved by systematically (rather than randomly) 
10 incrementing each D 0 by one and at the same time reducing the depth of a randomly selected 
queue by one (thus keeping the total memory usage constant and equal to m). The energy 
function of each potential solution may be evaluated and the best set of queue depths saved. 

For each of the above, an intelligent initial solution can improve the results and/or 
reduce the amount of time required to achieve a good solution. In one embodiment, the 
15 solution is initialized to have queue depths of D, y proportional to p, y (P }J + P 2J ) and summing to 
exactly m. 

Thus, FIG. 8 illustrates one embodiment of a method for finding a solution to the 
queue depth assignment problem. This embodiment begins at a step 80, where an initial 
solution is formed. This solution may be formed as described above, assuming that depths D tj 

20 are proportional to p u (P }j + P 2j ) and sum to exactly /w. 

At a step 88, the current best solution is mutated to determine if a better potential 
solution may be found. The possible solutions are generated at step 88. For each of the 
queues at the switch (the queue having an associated depth D 0 ), the applicable D 0 is decreased 
by one. In addition, a randomly selected queue depth D^, is incremented by one. This forms a 

25 new potential solution - moving one storage element from a current existing queue to a new 
queue. By both decrementing and adding one, the total memory for the switch remains the 
same. (Here, the adding and subtracting of one corresponds to adding and subtracting 
sufficient storage to accommodate one additional cell). 

After the new possible solution is generated, its energy function may be evaluated. If 

30 this is the best energy function encountered so far, this solution is saved and used for the next 
iteration (the next time step 88 is performed). Otherwise processing simply continues and the 
current solution remains the best one encountered so far. Optionally, in the event of a tie, the 
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newly generated solution is selected. After examining a variety of potential solutions, at step 
88, it is determined whether the algorithm has improved the best solution encountered so far 
at any point in the last (for example) twenty iterations (twenty times passing through step 88). 
If not the current best solution is taken as the solution to the queue depth problem. If so, the 

5 solution has not been stable for the last twenty iterations — processing continues by returning 
to step 88 (using the current best solution). 

FIG. 9 illustrates one embodiment of a graphical user interface that may be used for 
solving a queue depth assignment problem. In this particular embodiment, the interface 90 
includes an input area 91 and a help area 92. The help area 92 provides a scrollable help 

10 document. 

As illustrated at 91, the following fields may be input to frame the queue depth 
assignment problem. A number of switches in the network may be input, as shown at 91a, 
where more than one switch may be present in the switch fabric. 

At 91b, a user may input the number of input and output ports on each switch (N).- At 
15 9 1 c, the user may input the number of QoS levels supported by the switch. At 9 1 b, the user 
may input the total memory available on each switch. (In this embodiment, the input is in 
terms of the number of cells that can be stored in all of the buffers on the switch.) 

At 91e, the user may input the penalty for losing a cell on each QoS level. In the 
example illustrated in FIG. 9, there are two QoS levels (as shown at 91c). Accordingly, two 
20 different entries need to be made at 91 e one for each QoS level. 

Similarly, at 91f, the user inputs the penalties for cell delay on each QoS. As above, 
the number of entries may correspond to the number of QoS levels (again indicated at 91c). 

At 91 g, the processing rates (n) for each quality of service level are input. Finally, at 
91h, the arrival rates (A.) for each queue on every switch are input. Thus, in this example, 
25 eight entries need to be made — one for each of the two queues on each of the for output ports. 

Tables 2 and 3 below show examples of application of the algorithm of FIG. 8 to the 
following queue depth assignment problems. Values for X were determined by two different 
methods to stimulate mean and maximum load measures. In Table 2, X values were 
determined by taking the mean of five random numbers. In Table 3, X values are the 
30 maximum of five random numbers. In both cases, the constraint X iy < \ij is enforced. 

In all experiments, the number of QoS levels, M = 4, P } = (1 0, 5, 2, 1), and P 2 = (8, 4, 
0, 0). Values of n were 100, 60, 30, 15. The Percent Improvement columns show the 
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improvement over the initial solution (framed using the intelligent solution described above) 
in each QoS measure for each QoS level. CLRs and CTDs are averaged for each QoS, and are 
listed in order of QoS level. 
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As shown in Tables 2 and 3, the new solution is not always superior to the initial 

5 solution in all respects. Specifically, the CTD is often worse in the final solution than 
initially. However, the overall goodness of the solution has improved - some aspects of 
performance have been sacrificed in order to provide improved measures of aspects deemed 
more important. In these experiments, CTD was given a comparatively lower priority than 
CLR, resulting in decreased levels of performance in the CTD measure. 

10 Some of the percentage improvements listed are extremely large in magnitude. These 

values can be misleading, since the initial quantity may be small. Therefore, even though the 
percentage is large, the absolute change may be of only marginal significance. 

A number of problems were also solved by exhaustive search in order to objectively 
determine optimal solutions for comparison to the SAHC solutions. In every case, the SAHC 

15 algorithm found an optimal solution. The problems sizes were necessarily very small, on the 
order of 10 6 to 10 7 . It should be noted, however, that exhaustive search on even these small 
problems took hours of computation running on a Silicon Graphics Indigo 2 workstation, 
while the SAHC method was able to arrive at the same solutions in less than one second. 

In the above examples, it is assumed that memory could be allocated across all of the 

20 buffers in the network. This works well for initial system design. 

In an existing system, however, the buffering memories may not be easily reallocated 
between ports. Referring again to FIG. 1, each of the buffering components 16d-16f are 
connected to a respective port. After the switch has been designed and built, it may not be 
convenient to move memory from one of the buffering elements (e.g., 1 6d) to another 

25 buffering element (e.g., 16e). Where this is the case, it may still be possible to optimize queue 
depths within the individual buffering elements even after the switch has been constructed, 
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without a shared pool of memory for all buffers on the switch. For example, if each of the 
queues 43a-43d (of FIG. 4) are stored in a common memory, the amount of memory allocated 
to each of the buffers may be dynamically changed easily. The technique for assigning 
queues may be the same as that described above, except that fewer queues are analyzed. 

5 FIG. 10 illustrates one embodiment of a buffering unit according to one embodiment 

of the present invention, such as the buffering unit 16d of FIG. 1 . In this embodiment, a 
fabric interface controller 102 handles reception of cells from the network switch fabric 100 
(in 16d of FIG. 1, this would correspond to reception of cells from the network switch fabric 
12). The fabric interface controller may provide cells to the output queue buffers 103 at the 

10 direction of a buffer controller 1 06. Similar to the fabric interface controller 1 02, a port 

interface controller 104 handles transmission or reception of cells from the port 105. Both the 
fabric interface controller 102 and the port interface controller 104 may be implemented as off 
the shelf devices, or may be integrated into an application specific integrated circuit (ASIC) 
that includes all or part of the components shown in FIG. 10. 

15 The output queue buffers 103 may be a single dedicated memory device, several 

memory devices, registers, or a portion of a total memory space used within the switch. As 
described above, the latter most easily permits assignment and re-aligning of memory among 
buffering components associated with individual ports, whereas other embodiments may not 
as easily accommodate this. 

20 In one embodiment, the buffer controller 1 06 performs the control functions of FIGs. 

6-8. This may be done by responding to requests from the fabric interface controller 102 and 
the port interface controller 1 04 and controlling the output queue buffers 143 accordingly. In 
other embodiments, either or both of the fabric interface controller 1 02 and port interface 
controller 104 perform some or all of these control functions (as illustrated in FIG. 4), so that 

25 a buffer controller 1 06 is not necessary. In another embodiment, the buffer controller 106 
performs the functions of the fabric interface controller 102 and port interface controller 104 
The above embodiments also permit dynamic monitoring of network characteristics 
for the switch or port, and reassignment of queue depths on the fly. 

FIG. 11 illustrates one embodiment of this process. According to this embodiment, 

30 queue depths are assigned at a step 110. This may be done initially as described above, by 
making assumptions or estimates about network characteristics. 

At a step 1 12, the network characteristics are monitored. These characteristics may 
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conespond to whatever aspects affect the energy function used in the particular embodiment. 
For example, in the embodiments described above, mean cell arrival rates (X), cell drop rates, 
cell delay rates, average throughput, etc. may be measured. This monitoring may be done by 
the buffer controller, separate monitoring module, a network controller or other mechanism. 
5 Periodically, the queue depths may be reassigned, by returning to step 1 1 0. This may 

be done at fixed periods of time (e.g., once a day), or may be done whenever a change in 
network characteristics is sensed. By logging the network characteristics, a schedule of queue 
depths may be created. This may be useful where the characteristics of the network vary over 
time (e.g., where network characteristics in the evening are different than network 

1 0 characteristics in the morning). 

The process of assigning queue depths 110 may be performed by buffer controllers, as 
described above with reference to FIG. 10. Even where all of the buffers are held in a 
common memory and queue depths may be reassigned by sharing memory across more than 
one port, one or more buffer controllers may be responsible for assigning queue depths. In 

1 5 alternative embodiments, a separate processor may be provided for performing or 

coordinating the queue depth assignment problem, or this process may be performed by a 
network controller or other facility. 

The various methods above may be implemented as software on a floppy disk, 
compact disk, or other storage device, which controls a computer. The computer may be a 

20 general purpose computer such as a work station, main frame or personal computer, that 

performs the steps of the disclosed processes or implements equivalents to the disclosed block 
diagrams. Such a computer typically includes a central processing unit coupled to a random 
access memory and a program memory by a data bus of some form. The data bus may also be 
coupled to the output queue. The buffer controller 1 06 may, for example, perform these 

25 functions and be implemented in this manager. Alternatively, the various methods may be 
implemented in hardware such on an ASIC or other hardware implementation. Of course, in 
either hardware or software embodiments, functions performed by the above elements and the 
varying steps may be combined in varying arrangements of hardware and software. 

Having thus described at least one illustrative embodiment of the invention, various 

30 modifications and improvements will readily occur to those skilled in the art and are intended 
to be within the scope of the invention. Accordingly, the foregoing description is by way of 
example only and is not intended as limiting. The invention is limited only as defined in the 
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following claims and the equivalents thereto. 
What is claimed is: 
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CLAIMS 



1 . A buffer element for a communication network, the buffer element comprising: 

a first buffer memory to store communication units corresponding to a first quality of 
5 service level; 

a second buffer memory to store communication units corresponding to a second 
quality of service level; and 

a buffer manager, coupled to the first buffer memory and the second buffer memory, to 
selectively store communication units in the first buffer and the second buffer based on a 
1 0 corresponding quality of service level of the communication units, and to retrieve 
communication units from the first buffer memory and the second buffer memory. 

2. The buffer element of claim 1, wherein the buffer manager comprises: 

a sorter unit coupled to the first buffer memory and the second buffer memory to 
15 selectively store a communication unit in the first buffer or the second buffer based on a 
quality of service level of the communication unit. 

3. The buffer element of claim 1, wherein the first buffer memory has a first depth, the 
second buffer memory has a second depth, and the buffer element further comprises: 

20 a depth adjuster to adjust the first depth and the second depth. 

4. The buffer element of claim 3, wherein the depth adjuster comprises: 

means for iteratively searching possible depth assignments to determine the first depth 
and the second depth. 

25 

5. The buffer element of claim 4, wherein the means for searching comprises: 
means for performing a steepest ascent hill climbing search. 

6. The buffer element of claim 3, wherein the depth adjuster comprises: 
30 means for determining performance characteristics of the switch. 
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7. The buffer element of claim 1 , wherein the first buffer memory and the second buffer 
memory are regions of memory in a contiguous random access memory device. 

8. The buffer element of claim 1 , wherein the communication units are ATM cells. 

5 

9. A switch for a communication network, the switch comprising: 
a plurality of ports; 

a first buffer memory coupled to one of the ports to store communication units 
corresponding to a first quality of service level; and 
10 a second buffer memory coupled to the one of the ports to store communication units 

corresponding to a second quality of service level. 

1 0. The switch of claim 9, further comprising: 

a buffer manager, coupled to the first buffer memory and the second buffer memory, to 
15 selectively store communication units in the first buffer and the second buffer based on a 
corresponding quality of service level of the communication units, and to retrieve 
communication units from the first buffer memory and the second buffer memory. 

1 1 . The switch of claim 9, wherein: 

20 the plurality of ports comprises a plurality of output ports that output communication 

units from the switch to the network; and 

the first buffer memory and the second buffer memory are coupled to one of the 
plurality of output ports, to store communication units to be output to the one of the plurality 
of output ports. 

25 

12. The switch of claim 1 1 , wherein: 

each of the plurality of output ports has a respective first buffer memory and a 
respective second buffer memory to store communication units transmitted across the 
respective output port. 
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13. The switch of claim 12, wherein: 

each of the plurality of output ports has a respective buffer manager to selectively 
store communication units in the respective first buffer and the respective second buffer based 
on a corresponding quality of service level of the communication units, and to retrieve 
5 communication units from the respective first buffer memory and the respective second buffer 
memory. 

14. The switch of claim 9, wherein: 

the plurality of ports comprises a plurality of input ports that receive communication 
10 units from the switch to the network; and 

the first buffer memory and the second buffer memory are coupled to one of the 
plurality of input ports, to store communication units received on the one of the plurality of 
input ports. 

15 15. The switch of claim 1 4, wherein: 

each of the plurality of input ports has a respective first buffer memory and a 
respective second buffer memory to store communication units transmitted across the 
respective input port. 

20 1 6. The switch of claim 1 5, wherein: 

each of the plurality of input ports has a respective buffer manager to selectively store 
communication units in the respective first buffer and the respective second buffer based on a 
corresponding quality of service level of the communication unit, and to retrieve 
communication units from the respective first buffer memory and the respective second buffer 

25 memory. 

1 7. The switch of claim 1 5, wherein the communication units are ATM cells. 

18. A method buffering communication units in a communication network, the method 
30 comprising steps of: 

assigning a queue depth for each of a plurality of queues,.each queue being designated 
to store communication units of a predetermined quality of service level; 
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providing the plurality of queues, each queue having the corresponding assigned 

depth; 

selecting one of the queues to receive a communication unit based on a quality of 
service level associated with the communication unit; and 
5 storing the communication unit in the selected queue. 

1 9. The method of claim 1 8, further comprising a step of adjusting the queue depths. 

20. The method of claim 1 8, further comprising steps of: 

10 monitoring a characteristic in the communication network; and 

adjusting the assigned queue depths based on the monitored characteristic. 

21. The method of claim 20, wherein the characteristic is selected from the group 
consisting of communication unit arrival rate for one of the quality of service levels, 

15 communication unit processing rate for one of the quality of service levels, communication 
unit loss rate for one of the quality of service levels and communication unit delay rate for one 
of the quality of service levels. 



22. The method of claim 1 8, wherein each of the plurality of queues stores communication 
20 units for a single port in a communication network switch. 

23. The method of claim 22, wherein the single port is an output port. 

24. The method of claim 1 8, wherein the plurality of queues stores the communication 
25 units for each port of a switch in the communication network. 

25. The method of claim 1 8, wherein the assigning step comprises a step of: 
determining a priority level for dropped communication units for each of the quality of 

service levels. 



30 



WO 99/57858 PCT/US99/09853 

-25- 

26. The method of claim 1 8, wherein the assigning step comprises a step of: 

assigning a priority level for communication unit delay for each of the quality of 
service levels. 

5 27. The method of claim 1 8, wherein the assigning step comprises a step of: 
performing a search of possible depth assignments. 

28. The method of claim 27, wherein the performing step comprises a step of: 
performing a steepest ascent hill climbing search. 

10 

29. The method of claim 1 8, wherein the communication units are ATM cells. 



30. A method of selecting a communication unit, for transmission in a communication 
network that provides a plurality of quality of service levels, the communication unit being 

15 selected from a plurality of communication units stored in a buffer, the buffer including a 
plurality of queues, each queue corresponding to one of the quality of service levels, the 
method comprising steps of: 

identifying the queue with the highest corresponding quality of service level and 
which is not empty; and 

20 selecting the communication unit from the identified queue. 

31. A method of storing a communication unit in a buffer, the communication unit having 
one of a plurality of quality of service levels, the buffer including a plurality of queues, each 
queue corresponding to one of the quality of service levels, the method comprising steps of: 

25 determining the quality of service level of the communication unit; and 

storing the communication unit in the queue having the corresponding quality of 
service level of the communication unit. 



30 



32. The method of claim 3 1 , further comprising a step of: 

dropping the communication unit when the queue having the quality of service level of 
the communication unit is full. 
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